Leveraging Known Semantics for Spelling Correction
نویسندگان
چکیده
Focusing on applications for analyzing learner language which evaluate semantic appropriateness and accuracy, we build from previous work which modeled some aspects of interaction, namely a picture description task (PDT), with the goal of integrating a spelling correction component in this context. After parsing a sentence and extracting semantic relations, a surprising number of analysis failures stem from misspellings, deviating from expected input in ways that can be modeled when the content of the interaction is known. We thus explore the use of spelling correction tools and language modeling to correct misspellings that often lead to errors in obtaining semantic forms, and we show that such tools can significantly reduce the number of unanalyzable cases. The work is useful for any context where image descriptions or some expected content is available, but not necessarily expected linguistic forms.
منابع مشابه
Design and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملImproving Query Spelling Correction Using Web Search Results
Traditional research on spelling correction in natural language processing and information retrieval literature mostly relies on pre-defined lexicons to detect spelling errors. But this method does not work well for web query spelling correction, because there is no lexicon that can cover the vast amount of terms occurring across the web. Recent work showed that using search query logs helps to...
متن کاملData Analysis Project: Leveraging Massive Textual Corpora Using n-Gram Statistics
We study methods of efficiently leveraging massive textual corpora through n-gram statistics. Specifically, we explore algorithms that use a database of frequency counts for sequences of tokens in a teraword Web corpus to correct spelling mistakes and to extract a list of instances of some category given only the name of the target category. For spelling correction, we use a novel correction al...
متن کاملSENSEABLE SEARCH: Selective Query Disambiguation
We present a method for detecting and resolving lexical ambiguity in information retrieval queries. Leveraging existing word sense disambiguation tools, we define a measure of query term ambiguity based on the distribution of word senses in the relevant document set. If a query term is ambiguous, we allow the user to select the correct sense of the query term, in the style of Google’s spelling ...
متن کاملExtended HMM and Ranking Models for Chinese Spelling Correction
Spelling correction has been studied for many decades, which can be classified into two categories: (1) regular text spelling correction, (2) query spelling correction. Although the two tasks share many common techniques, they have different concerns. This paper presents our work on the CLP-2014 bake-off. The task focuses on spelling checking on foreigner Chinese essays. Compared to online sear...
متن کامل